A Reusability-Aware Cache Memory Sharing Technique for High Performance CMPs with Private L2 Caches

نویسندگان

  • Sungjune Youn
  • Hyunhee Kim
  • Jihong Kim
چکیده

For high-performance chip multiprocessors (CMPs) to achieve their maximum performance potential, an efficient support for memory hierarchy is important. Since off-chip accesses require a long latency, high-performance CMPs are typically based on multiple levels of on-chip cache memories. For example, most current CMPs support two levels of on-chip caches. While the L1 cache architecture of these CMPs is almost same, the L2 cache organization is quite different. All the processors in the CMP may share the same L2 cache or each processor may have its own private L2 cache. While a private L2 cache has a short access time, a shared L2 cache can better adapt to varying memory requirements of each processor. In this paper, we propose an on-chip L2 cache organization which takes advantage of both a private L2 cache and a shared L2 cache. In skeleton, our L2 cache organization is based on a private L2 cache organization which has the short access latency. When a cache block in the private L2 cache is selected for an eviction, our proposed organization first evaluates the reusability of the cache block. If the cache block is likely to be reused, we save the evicted cache block in one of peer L2 caches which may have efficiently invalid blocks. By selectively writing evicted cache blocks to peer L2 caches, the proposed L2 cache organization can effectively simulate a shared L2 cache. Experimental results using a CMP simulator showed that the proposed L2 cache organization improved the average memory latency by 14% on average over a pure private L2 cache organization for the SPLASH2 benchmark programs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Understanding the Limits of Capacity Sharing in CMP Private Caches

Chip Multi Processor (CMP) systems present interesting design challenges at the lower levels of the cache hierarchy. Private L2 caches allow easier processor-cache design reuse, thus scaling better than a system with a shared L2 cache, while offering better performance isolation and lower access latency. While some private cache management schemes that utilize space in peer private L2 caches ha...

متن کامل

Balancing Capacity and Latency in CMP Caches

The large working sets of commercial and scientific workloads stress the L2 caches of Chip Multiprocessors (CMPs). Some CMPs use a shared L2 cache, to maximize the onchip cache capacity and minimize misses. Others use private L2 caches, replicating data to limit the delay due to global wires and minimize cache access time. Recent hybrid proposals strive to balance latency and capacity, but use ...

متن کامل

BP-NUCA: Cache Pressure-Aware Migration for High-Performance Caching in CMPs

As the momentum behind Chip Multi-Processors (CMPs) continues to grow, Last Level Cache (LLC) management becomes a crucial issue to CMPs because off-chip accesses often involve a big latency. Private cache design is distinguished by smaller local access latency, good performance isolation and easy scalability, thus is becoming an attractive design alternative for LLC of CMPs. This paper propose...

متن کامل

Computer Science and Artificial Intelligence Laboratory Victim Migration: Dynamically Adapting Between Private and Shared CMP Caches

Future CMPs will have more cores and greater onchip cache capacity. The on-chip cache can either be divided into separate private L2 caches for each core, or treated as a large shared L2 cache. Private caches provide low hit latency but low capacity, while shared caches have higher hit latencies but greater capacity. Victim replication was previously introduced as a way of reducing the average ...

متن کامل

Victim Migration: Dynamically Adapting Between Private and Shared CMP Caches

Future CMPs will have more cores and greater onchip cache capacity. The on-chip cache can either be divided into separate private L2 caches for each core, or treated as a large shared L2 cache. Private caches provide low hit latency but low capacity, while shared caches have higher hit latencies but greater capacity. Victim replication was previously introduced as a way of reducing the average ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007